66 research outputs found
High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing
Many music signals can largely be considered an additive combination of
multiple sources, such as musical instruments or voice. If the musical sources
are pitched instruments, the spectra they produce are predominantly harmonic,
and are thus well suited to an additive sinusoidal model. However,
due to resolution limits inherent in time-frequency analyses, when the harmonics
of multiple sources occupy equivalent time-frequency regions, their
individual properties are additively combined in the time-frequency representation
of the mixed signal. Any such time-frequency point in a mixture
where multiple harmonics overlap produces a single observation from which
the contributions owed to each of the individual harmonics cannot be trivially
deduced. These overlaps are referred to as overlapping partials or harmonic
collisions. If one wishes to infer some information about individual sources in
music mixtures, the information carried in regions where collided harmonics
exist becomes unreliable due to interference from other sources. This interference
has ramifications in a variety of music signal processing applications
such as multiple fundamental frequency estimation, source separation, and
instrumentation identification.
This thesis addresses harmonic collisions in music signal processing applications.
As a solution to the harmonic collision problem, a class of signal
subspace-based high-resolution sinusoidal parameter estimators is explored.
Specifically, the direct matrix pencil method, or equivalently, the Estimation
of Signal Parameters via Rotational Invariance Techniques (ESPRIT)
method, is used with the goal of producing estimates of the salient parameters
of individual harmonics that occupy equivalent time-frequency regions. This
estimation method is adapted here to be applicable to time-varying signals
such as musical audio. While high-resolution methods have been previously
explored in the context of music signal processing, previous work has not
addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this
thesis answers the question of whether high-resolution sinusoidal parameter
estimators are really high-resolution for real music signals.
This work directly explores the capabilities of this form of sinusoidal parameter
estimation to resolve collided harmonics. The capabilities of this
analysis method are also explored in the context of music signal processing
applications. Potential benefits of high-resolution sinusoidal analysis are
examined in experiments involving multiple fundamental frequency estimation
and audio source separation. This work shows that there are indeed
benefits to high-resolution sinusoidal analysis in music signal processing applications,
especially when compared to methods that produce sinusoidal
parameter estimates based on more traditional time-frequency representations.
The benefits of this form of sinusoidal analysis are made most evident
in multiple fundamental frequency estimation applications, where substantial
performance gains are seen. High-resolution analysis in the context of
computational auditory scene analysis-based source separation shows similar
performance to existing comparable methods
BAW-Brief Nr. 1 – Januar 2008
595-B, Bautechnik, Neue Bemessungsregeln für Hänger an Stabbogenbrücken594-B, Bautechnik, BAW-Merkblatt Zweitbeto
Contrastive Learning for Cross-modal Artist Retrieval
Music retrieval and recommendation applications often rely on content
features encoded as embeddings, which provide vector representations of items
in a music dataset. Numerous complementary embeddings can be derived from
processing items originally represented in several modalities, e.g., audio
signals, user interaction data, or editorial data. However, data of any given
modality might not be available for all items in any music dataset. In this
work, we propose a method based on contrastive learning to combine embeddings
from multiple modalities and explore the impact of the presence or absence of
embeddings from diverse modalities in an artist similarity task. Experiments on
two datasets suggest that our contrastive method outperforms single-modality
embeddings and baseline algorithms for combining modalities, both in terms of
artist retrieval accuracy and coverage. Improvements with respect to other
methods are particularly significant for less popular query artists. We
demonstrate our method successfully combines complementary information from
diverse modalities, and is more robust to missing modality data (i.e., it
better handles the retrieval of artists with different modality embeddings than
the query artist's)
Supervised and Unsupervised Learning of Audio Representations for Music Understanding
In this work, we provide a broad comparative analysis of strategies for
pre-training audio understanding models for several tasks in the music domain,
including labelling of genre, era, origin, mood, instrumentation, key, pitch,
vocal characteristics, tempo and sonority. Specifically, we explore how the
domain of pre-training datasets (music or generic audio) and the pre-training
methodology (supervised or unsupervised) affects the adequacy of the resulting
audio embeddings for downstream tasks.
We show that models trained via supervised learning on large-scale
expert-annotated music datasets achieve state-of-the-art performance in a wide
range of music labelling tasks, each with novel content and vocabularies. This
can be done in an efficient manner with models containing less than 100 million
parameters that require no fine-tuning or reparameterization for downstream
tasks, making this approach practical for industry-scale audio catalogs.
Within the class of unsupervised learning strategies, we show that the domain
of the training dataset can significantly impact the performance of
representations learned by the model. We find that restricting the domain of
the pre-training dataset to music allows for training with smaller batch sizes
while achieving state-of-the-art in unsupervised learning -- and in some cases,
supervised learning -- for music understanding.
We also corroborate that, while achieving state-of-the-art performance on
many tasks, supervised learning can cause models to specialize to the
supervised information provided, somewhat compromising a model's generality
Recommended from our members
Melody Transcription From Music Audio: Approaches and Evaluation
Although the process of analyzing an audio recording of a music performance is complex and difficult even for a human listener, there are limited forms of information that may be tractably extracted and yet still enable interesting applications. We discuss melody--roughly, the part a listener might whistle or hum--as one such reduced descriptor of music audio, and consider how to define it, and what use it might be. We go on to describe the results of full-scale evaluations of melody transcription systems conducted in 2004 and 2005, including an overview of the systems submitted, details of how the evaluations were conducted, and a discussion of the results. For our definition of melody, current systems can achieve around 70% correct transcription at the frame level, including distinguishing between the presence or absence of the melody. Melodies transcribed at this level are readily recognizable, and show promise for practical applications
Excitations of single-beauty hadrons
In this work we study the predominantly orbital and radial excitations of
hadrons containing a single heavy quark. We present meson and baryon mass
splittings and ratios of meson decay constants (e.g., and
) resulting from quenched and dynamical two-flavor
configurations. Light quarks are simulated using the chirally improved (CI)
lattice Dirac operator at valence masses as light as MeV.
The heavy quark is approximated by a static propagator, appropriate for the
quark on our lattices ( GeV). We also include some preliminary
calculations of the kinetic corrections to the states, showing,
in the process, a viable way of applying the variational method to three-point
functions involving excited states. We compare our results with recent
experimental findings.Comment: 23 pages, 18 figures, 17 tables; slight title change (Ed. killjoy);
reference added; version to appear in Phys Rev
- …